Anomalous diffusion in a symbolic model 
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We address this work to investigate some statistical properties of symbolic sequences generated 
by a numerical procedure in which the symbols are repeated following a power law probability den- 
sity. In this analysis, we consider that the sum of n symbols represents the position of a particle in 
erratic movement. This approach revealed a rich diffusive scenario characterized by non-Gaussian 
■ distributions and, depending on the power law exponent and also on the procedure used to build 

the walker, we may have superdiffusion, subdiffusion or usual diffusion. Additionally, we use the 
continuous-time random walk framework to compare with the numerical data, finding a good agree- 
ment. Because of its simplicity and flexibility, this model can be a candidate to describe real systems 
£\j ■ governed by power laws probabilities densities. 

PACS numbers: 05.40.Fb,02.50.-r,05.45.Tp 
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I. INTRODUCTION 

B 

The studies of complex systems are widespread among the physical community [H-0| and a large amount of these 
investigations deals with records of real numbers ordered in time or in space. Based on these time series, the aim of 
these works is to extract some features, patterns or laws which govern a given system. There is an extensive literature 
of statistical tools devoted to analyze time series. For instance, the detrended fluctuation analysis (DFA)[f| can be 
used to examine the presence of correlations in the data. However, many of these analysis are not focused on the 
original data but in sub-series like the absolute value, the return value or the volatility series 

In particular, a time series can be converted into a symbolic sequence by using a discrete partition in the data 
domain and assigning a symbol to each site of partition, technique that is known as symbol statistics 7]. A priori, 
any data set can be mapped into a symbolic sequence by using a specific rule (see for instance Ref. [8j). A typical 
analysis performed for this symbolic sequence is to evaluate its block entropy. This approach measures the amount of 
\ information contained in the block or the average information necessary to predict subsequent symbols. This analysis 
\Q • was applied to a wide range of topics, including DNA sequences^. In this context, Buiatti et aZ. (T3| introduced a 
numerical model which generates long-range correlations among the symbols of a symbolic sequence, leading to a slow 
, growth of the usual block entropy. 

Motivated by this anomalous behavior in the block entropy, our main goal is to construct a diffusive process 
based on these sequences. The diffusive processes generated with these sequences are expected to be Markovian or 
non-Markovian depending on the conditions imposed on these sequences. For Markovian processes or short-term 
memory systems the mean square displacement grows linearly in time. On the other hand, non-Markovian processes 
or long-term memory systems often display deviations of this linear behavior, being better described by a power law 
on time with the exponent a. This is the fingerprint of the anomalous diffusion and depending on the a value we 
may have superdiffusion (a > 1) or subdiffusion (a < 1) and for a = 1 the usual spreading is recovered. Several 
physical systems exhibit this power law pattern. For instance, porous substrate [Tl|. diffusion of high molecular weight 
polyisopropylacrylamide in nanopores[12j|. highly confined hard disk fluid mixture [13J, fluctuating particle fluxes [1J], 
diffusion on fractals [HI, ferrofiuid[l6j|. nanoporous material (l7j. and colloids fl8|. 

In this context, the model proposed by Buiatti et ah has an essential ingredient leading to anomalous diffusion: the 
long-term memory present in their symbolic sequences. We will show that a diffusive process based on these sequences 
lead to a rich diffusive picture, where ballistic diffusion, superdiffusion, subdiffusion, and also usual diffusion can 
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emerge, depending on the model parameters and on the mode of construction of the process. In addition, we compare 
our numerical results with analytic models based on continuous-time random walk jl 9142 8| . 

This paper is organized as follows. In Section 2 we present and review some properties of the model. Section 
3 is devoted to define erratic trajectories from the sequences as well as to investigate their diffusive behavior. A 
comparative analysis of this diffusive aspect with continuous-time random walk viewpoint is performed in Section 4. 
Finally, we end with a summary and some concluding comments. 



II. THE MODEL 



The original model|lOj is a numerical experiment that generates equally likely symbols which are repeated among 
the sequence following a power law probability density. In order to describe the model, let A — {a\, . . . ,a n } be 
the set of symbols and Q = {Qi, Qi, ■ ■ ■ > Qn} represent a sequence where Qi 6 A. To specify each Qi, we initially 
select randomly one of the symbols of the alphabet A and repeated it N y times inside the sequence, in such way that 



Qi — Qi+l 



Qi + N y _i. The number N y is obtained from 



N y = [y] + 1 with y = A 



(1-^)1/(^-1) 



- 1 



(1) 



where A > and /i > 1 are real parameters, r) is a random variable uniformly distributed in the interval [0,1], and [y] 
denotes the integer part of y. By using this procedure, a typical symbolic sequence with N = 10 and A = { — 1,1} is 



Q 



Since 77 is a random number uniformly distributed, y will be a positive random number. Moreover, p(y) can be 
calculated in a straightforward manner leading to 




p(y) = - 1) 



A^~ 



{A + yY 



(2) 



Therefore, the model basically consists in the repetitions of N y blocks of symbols with N y distributed according a 
power law of exponent /x in the asymptotic limit. Furthermore, the first and the second moments of p{y) are given by 



(y) 



and 



(y 2 ) 



yp(y)dy 



y 2 p(y) dy 



A 



(A* -2) 



2A 2 



(for n>2) 



Oi - 2)0* - 3) 



(for pi > 3). 



(3) 



(4) 



Note that when fi < 2 both moments diverge and when /i < 3 the second moment diverges while the first one remains 
finite. Thus, when [i is close to 2, N y can be very large filling a significante part of the sequence Q with the same 
symbol. On the other hand, very large values of N y become rare for greater than 3 which makes the sequence highly 
alternating. 

It is well established that this method of building symbolic sequences generates long-range correlations between 
elements of the sequence characterized by a power law correlation function (see the analytical development of Buiatti 
et al. [10]). It was also studied that correlations lead to a non- linear growth of the usual block entropy, i.e., the usual 
entropy is not extensive for fi < 3. In such context, these sequences were investigated in the framework of the so 
called non-extensive Tsallis statistical mechanics. In particular, it was shown that the Tsallis block entropy S q can 
recover extensivity for a specific choice of the entropic index o[iol [29l|. 



III. DIFFUSIVE PROCESS 



As we raised in the introduction, long-term correlations or non-Markovian processes frequently present anomalous 
properties when investigated in the context of diffusion. In this direction, to construct erratic trajectories from these 
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FIG. 1: Erratic trajectories x(n) for three values of fj, when considering the two symbol alphabet A = { — 1, 1} and A = 1. Note 
that depending on the values of li the trajectories are very different. 

sequences may yield a rich diffusive scenario based on a simple model. This point has been noted by Buiatti et al. 
and also briefly discussed by us in Ref. [29j . 

A simple and direct way to obtain the trajectories is to consider that each symbol in a sequence represents the 
length of the jump of a particle in erratic movement. The position of the symbol plays the role of time. In this 
manner, the variable 

n 

x{n) = Y J Q l (5) 

i=l 

represents the position of the particle after a time n, which is an integer because of the construction. 

Let us start our investigation by considering the simplest symmetric case, i.e., the two symbols alphabet A = {—1, 1}. 
Thus, the particle is equally likely to jump to the right or to the left according to whether Qi = 1 or Qi = — 1 and 
the variable x(n) represents a random walk-like process where x(l) = ±1, x(2) = 0, ±2, x(3) = ±3, ±1 and so on, 
with n playing the role of time. Figure [1] illustrates x(n) for three values of ll with A = 1. Note that the trajectories 
are remarkably different depending on the values of li. For small values of ll (// < 3), we can see that the trajectories 
are governed by two mechanisms: spatial localization and large jumps. When ji < 2, larger is the jump, reflecting the 
fact that all moments of p(y) diverge for ll < 2. On the other hand, the second moment of p(y) is finite for ll > 3 and 
the trajectories are very similar to usual random walks. 

As pointed out in the introduction, when dealing with diffusive process, it is very common to investigate how the 
particles are spreading by evaluating the variance c 2 (n) = ({x(n) — (x(n))) 2 ), where the angle brackets denote an 
ensemble average. The usual Brownian motion [SO, HH is characterized by a 2 (n) ~ n and by a Gaussian propagator 
p(x, n) ~ n~ 1 l 2 exp(—x 2 /2n) which is a direct consequence of the central limit theorem and the Markovian nature of 
the underlying stochastic process. On the other hand, the anomalous diffusion behavior is usually distinguished by 
the value of the exponent a|32j in 

a 2 {n)ocn a . (6) 

We have subdiffusion when < a < 1 and superdiffusion when a > 1. The crossover between subdiffusion and 
superdiffusion corresponds to the usual Brownian motion and the case a — 2 is the ballistic regime. 

In this direction, we evaluate the variance for several values of ll over an ensemble average of 5 x 10 5 realizations 
as shown in Figure [5Jt. In a log-log plot the slope of the curve a 2 (n) versus n is numerically equal to the exponent a 
which is visibly changing with the parameter ll. In Figure (2b, we quantify this dependence by plotting a versus ll. 
From this figure, we have basically three diffusion regimes depending on the existence of the first (y) and the second 
(y 2 ) moments of p(y): (i) a ballistic one for ii < 2 ((y) — > oo and (y 2 ) — > oo), (ii) a superdiffusive one for 2 < /i < 3 
((y) finite and (y 2 ) — > oo) and (hi) the usual Brownian motion for \x > 3 ((y) and (y 2 ) finite). 

Next, we investigate the role of size of the symbol space by considering that more symbols are present in the 
alphabet A. We consider first the presence of zeros, i.e., A = { — 1,0, 1} where each symbol is equiprobable within 
the sequence. The zero symbol allows the particles to stay motionless for a certain time what could be related to 
subdiffusion. However, as we show in Figure the presence of zeros in the sequence does not significantly change 
the profile of the relation li versus a. Moreover, even if zero symbol becoming more probable within the sequence, 
i.e., A = { — 1, T , 1} where T is number of zero symbols in the alphabet, this result remains valid, as we also show in 
Figure[3]i. Second, we study larger alphabets from A = {-2, — 1, 0, 1, 2} to A = {-20, .... -2, -I, 0, 1, 2, ... , 20} and 
the results are shown in Figure [3^,. We found that the relation li versus a does not depend on the size of the symbol 
space. This relation is also robust for variations of the parameter A and for non-symmetric alphabets. In particular, 
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FIG. 2: Results concerning the alphabet A = { — 1, 1} for A = 1. (a) Logarithm of the variance cr 2 (n) versus logarithm of n 
for /i = 1.8 (squares), fi — 2.4 (circles), /i = 2.8 (triangles) and fi — 3.4 (diamonds). In this figure, the slopes of the curves 
are numerically equal to the exponents a and the straight lines are linear fit to the data used to obtain the values of a. (b) 
The dependence of the exponent a on fi. From this figure, we have basically three diffusion regimes: a ballistic (/j, < 2), a 
superdiffusive (2 < \x < 3) and the usual Brownian motion (fi > 3). The straight lines are the predictions of the continuous-time 
random walk model related to equation @. 





FIG. 3: (a) The dependence of the exponent nonp when considering the alphabet A = { — 1, 0, 1} and varying the probability 
of the zero symbol within the sequence. For the squares the symbols are equiprobable and for the circles the zero symbol is 
twenty times more likely to occur (^4 = { — 1,0 20 ,1}). The triangles and the diamonds are the results concerning the larger 
alphabets A = {—2, —1, 0, 1, 2} and A = {—20, . . . , —2, —1, 0, 1, 2, ... , 20}, respectively, (b) The dependence of the exponent a 
on (iz when considering the equiprobable alphabet A = { — 1,0, 1} and fij — 6. Note the presence of a subdiffusive regime for 
1 ^ Mz 2. All results were obtained by using sequences of length 10 7 averaged over 5 x 10 5 realizations with A = 1. The 
straight lines are the predictions of the continuous-time random walk model related to equation (J5J) and (JTUJ , respectively. 



the parameter A only produces a multiplicative effect in the equation © and a non-symmetric alphabet produces a 
drift which does not affect the spreading of the system. 

Until now we were not able to generate subdiffusion, even adding the zero symbol more likely to occur. This result 
suggests that only superdiffusion can emerge when considering the same value of \x for the symbols that lead to jumps 
and for the zero symbol that lead to absence of motion. The reasons for this behavior are related to the fact that 
even the zero symbol being much more likely than other symbols, the number of repetitions N y is independent of the 
symbol. Thus, the particles can remain at rest for a long time but the flights can be equally long, since for fi < 3 
there is no characteristic scale for N y . 

In this direction, let us consider a sequence where the jumping symbols are related to \Xj value (jij > 3) and the 
zero symbol is related other fi z value. In this manner, the flights have a characteristic scale while the rest periods 
may or may not have this characteristic scale (depending on the /i z value). The results concerning this scenario are 
shown in Figure [3|d where we show the dependence of a on /i z for a fixed value of fj,j — 6 (different values of /ij > 3 
does not affect this relation). From this figure, we can identify three diffusive regimes: no diffusion for fi z « 1 where 
the sequence is practically filled by zeros, subdiffusion 1 < \x z < 2 and usual diffusion for /i z > 2. 

We also evaluated the probability density functions (pdf) of x(n) to investigate the shape of distribution p(x,n) 



5 




FIG. 4: Probability density function of x(ri) for the values of fi (indicated in the figures) for three values of n 10 7 (squares), 
2 x 10 6 (circles) and 714 x 10 3 (triangles), when considering the equiprobable alphabets A = { — 1, 1} (upper panel), A = 
{ — 11, . . . , 0, ...11} (middle panel) and the alphabet A = { — 1, 10 , 1} where the zero symbol is ten times more probable than 
the others (lower panel). The histograms were obtained by using sequences of length 10 7 with A = 1 and 5 x 10 5 realizations 
of the numerical experiment. 



for different values of fj, as well as for the different constructions of the erratic trajectories. Figure 0] shows these 
distributions for the equiprobable alphabets A = { — 1,1} (upper panel), A= {— 11, . . . , 0, ...11} (middle panel) and 
the alphabet A = { — 1, 10 , 1} with the zero symbol being ten times more probable than the others (lower panel). We 
can see that the distributions are characterized by non-Gaussian profiles with heavy tails when // < 3, recovering the 
Gaussian propagator when /i > 3. Further, a visual inspection suggests that the different constructions of the erratic 
trajectories only change the scale of these plots. 

The situation is remarkably different when considering one value of fj, for the jumping symbols and other for the 
zero symbol with the alphabet A = { — 1,0, 1}. Figure [5] shows the distributions for this case. Notice that the shape 
of distributions goes from a Laplace (p(x) ~ exp(— \x\)) to Gaussian distribution, depending on the fi z value. 



IV. CONTINUOUS-TIME RANDOM WALK MODELS 



So far we have empirically described the diffusive behavior of the symbolic sequences proposed by Buiatti et al.. 
Now let us compare these empirical findings with some analytical models based on continuous-time random walk. 

In the continuous-time random walk (CTRW) of Montroll[l9| (see also the random walk process is fully 

specified by the function ip(x,t), the probability density to move a distance x in time t. We can distinguish three 
different ways to make the movement: the particle waits until it moves instantaneously to a new position (jump 
model) or the particle moves at constant velocity to a new position and chooses randomly a new direction (velocity 
model) or the particle moves at constant velocity between turning points that are chosen randomly [2lJ. There are two 
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FIG. 5: Probability density function of x(n) for the values of /i z (indicated in the figures) for three values of n 10 7 (squares), 
2 x 10 6 (circles) and 714 x 10 3 (triangles). The histograms were obtained by using sequences of length 10 7 with A — 1, fi 3 ■ — 6 
and 5 x 10 5 realizations of the numerical experiment with the alphabet A = { — 1, 0, 1}. 



fundamental approaches to the CTRW: (i) the decoupled and (ii) the coupled formalisms. In (i) the function tp(x,t) 
is supposed to factor in the form ip(x,t) = w(t)X(x), i.e., the jumping and the waiting time are independent random 
variables. For (ii) both process are coupled, a jump of a certain length may involve a time cost or vice-versa. This 
coupled form commonly leads to more cumbersome calculations. 

Here, we notice that because of the erratic trajectories construction, every continuous jump (without changing the 
sequence symbol) with length N y occurs at constant velocity and costs the same N y units of time to be performed. 
This fact lead us to the velocity and coupled model when considering the equiprobable alphabet A = {— 1, 1}. When 
adding the zero symbol the resting times are decoupled from the jumps, but the jumping times stay coupled. In 
addition, we have to remember that our formal time n is a discrete variable. Thus, comparisons with this formalism 
should be viewed as semi-quantitative. In this context, it is interesting to note that the work of Gorenflo et al. (23l. |2~4T| 
extends the Montroll's theory to the discrete domain considering the decoupled version of the CTRW, in contrast to 
the first approach used here. 

Henceforth we start by considering the coupled velocity model of Zumofen and Klafter 22] to compare with our 
sequences. For this case 

rl>{x,t) = ±8Qx\-t)w(t), (7) 

where w(t) ~ t r ~ l . Note that in this CTRW model long jumps are always penalized by long waiting times due to 
the presence of the S function. Furthermore, because of the asymptotic behavior of p(y) ~ 7 should be related 
to fx via 7 = fx — 1. Now, following the approach of Zumofen and Klafter, we can write the form of the distribution 
p(x, t) in the Fourier-Laplace space as 

, *(fc, u) 



where 



1 f°° 
^{x,t) = -S(\x\ -t) I w(t')dt' 

2 Jt 



is the probability density to move a distance x in time t in a single event and not necessarily stop at x. By using 
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H = 2.3,7= 1.34, c=1.20 



H = 2.6,y= 1.61,0=0.80 



11 = 2.9,7=1.79, c=0.87 





H = 2.6,7= 1.45,0=3.00 





FIG. 6: Probability density function of the scaling variable £ = ci/t 7 for some values of [i (indicated in the figure) for five 
values of n: 10 r (squares), 2 x 10 6 (circles), 714 x 10 3 (triangles), 55 x 10 3 (diamonds) and 15 x 10 3 (crosses). The upper 
panel shows the results for the equiprobable alphabet A = { — 1, 1}, the middle panel for A = { — 11, . . . , 0, . . . 11} and the lower 
panel for A = { — 1,0 10 , 1} with the zero symbol been ten times more probable than the others. The continuous lines are the 
predictions of the the continuous-time random walk model, equation (|11[) . The values of c and 7 are indicated in the figure 
and the numerical data was obtained from 5 x 10 5 realizations of equation © with A — 1. 




FIG. 7: Probability density function of the scaling variable £ = cx/t' 1 ^ 2 for some values of \i z (indicated in the figure) for five 
values of n: 10 7 (squares), 2 x 10 6 (circles), 714 x 10 3 (triangles), 55 x 10 3 (diamonds) and 15 x 10 3 (crosses) when considering 
the equiprobable alphabet A = {— 1, 0, 1} and fj,j = 6. The continuous line is the prediction of the the continuous-time random 
walk model, equation with 7 = 0.47 (c = 0.65) for ^ z = 1.5, 7 = 0.64 (c = 0.37) for fx z = 1.7, 7 = 0.94 (c = 0.15) for 
Hz = 1.9. The numerical data was obtained from 5 x 10 5 realizations of the numerical experiment with A = 1. 
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FIG. 8: (a) The averaged value of 7 versus fi obtained via the procedure described in the text for the equiprobable alphabet 
A = { — 1, 1}- (b) The averaged value of 7 versus /i z when considering the equiprobable alphabet A = { — 1,0, 1} and fij — 6. 
Notice that the relation 7 = /i — 1 or 7 = /i z - 1 obtained by comparing w(t) and p(y) is hold in general. The error bars are 
calculated via bootstrap resampling method 35]. 



p(k,u), we can evaluate the variance 



a 2 (t) = C- 1 



dk 2 



p(k,u) 



k=0 



i z , 1 < (jt < 2, 
i 4 "", 2 < n < 3, 
t, [i > 3. 



(9) 



Figure [2Jd shows the comparison with numerical data for the alphabet A = { — 1, 1} and Figure^ makes this for the 
alphabets A = {— 1, 0, 1} (with the zero symbol more probable) and also the larger alphabets A = {—2, —1, 0, 1, 2} 
and A = {—20, . . . , —2, —1, 0,1,2,..., 20}. Naturally, the agreement is better for the first case than the others, since 
it fulfills the requirements of the model. In general, we can see that the presence of the zero symbol makes the 
convergence of a to the limiting regimes slower(a = 2 and a — 1). 

We consider another CTRW model trying to reproduce the subdiffusive regime. Specifically, we employ the decou- 
pled version proposed by Montroll[33( where A(x) ~ exp(— x 2 ) and w(t) ~ f -7-1 (0 < 7 < 1 ). Following Montroll[33j 
or also Metzler and Klafter|20J we obtain 



a 2 (t) ~t"«- 



(10) 



where again we have used the relation 7 = fj, z — 1. Figure [3Jd confronts the numerical data with this expression for 
which we can see a good agreement. 

Additionally, we may also obtain the propagator from a small (fc, u) expansion for both previous cases. For the first 
one p(k, u) ~ + c\k\^) which for 2 < \i < 3 yields 



p(x,t) - < 

where L 7 (£) is the Levy stable distribution and £ 
+ c k 2 u 1 ) leading to 



t-ViLj®, \x\<t, 
0, |x| > t, 



(11) 



cx/t 1 is the scaling variable. For the second one p{k,u) 



p(x,t) - t-^l 2 R. 



2,0 
1.2 



21(1-7/2,7) 

(0,1), (1/2,1) 



(12) 



where H 



1.2 



21(1-7/2,7) 

(0,1), (1/2,1) 



is Fox H function [3^ and ^ = cx/t 1 / 2 is the scaling variable. 

Figure [6] shows the comparison for the first case and Figure [7] for the second one. In both cases we can see a 
good quality data collapse when the scaling is performed. Moreover, these figures show that we have found a good 
agreement between the numerical data and the CTRW models. The fitting parameter 7 of each case was obtained by 
minimizing the difference between the numerical data and the analytic expressions using the nonlinear least squares 
method. In all these figures we have employed the averaged value of 7 obtained from applying the method for 17 
values of n chosen logarithmically spaced from 10 3 to 10 7 . In addition, Figures [S^, and [SJa show the dependence of 
the averaged value 7 on /1 for both cases, showing that the relation 7 = ^ — 1 or 7 = - lis consistent with the 
numerical data. 
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V. SUMMARY 

Summing up, we verified that the symbolic model presented by Buiatti et aL[Io| gives rise to a rich diffusive 
scenario. Depending on the parameter \x (or (i z ), different anomalous diffusive regimes can emerge. Specifically, we 
have found subdiffusive, superdiffusive, ballistic, and usual regimes, depending on the model parameters and also 
on the trajectories construction. We also investigated the probability distributions of these processes where non- 
Gaussians were observed. Our findings support the existence of self-similarity in the data, due to the good quality of 
the data collapse when the scaling is performed. In addition, the numerical data were compared with predictions of 
the CTRW framework finding a good agreement. We believe that our empirical findings may help modeling systems 
for which power laws are present as well as to motivate other random walk constructions based on symbolic sequences. 
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